ZhiYuan Research Institute Jointly Builds Chinese Internet Corpus CCI to Provide Resources for Big Data and Artificial Intelligence Industries
ZhiYuan Research Institute, in collaboration with TuoSi and ZhongKe WenGe, has jointly established the 'Chinese Internet Corpus' (CCI). This corpus has undergone strict screening and cleaning, with a data scale of 104GB, covering the period from 2001 to 2023. ZhiYuan Research Institute will continue to expand data sources and improve data processing workflows to provide more high-quality and reliable data resources. The institute has also opened up other high-quality Chinese datasets, such as WUDAO corpus, COIG, and MTP. This initiative aims to support the big data and artificial intelligence industries.